Nearest-neighbors medians clustering

نویسندگان

  • Daniel Peña
  • Júlia Viladomat
  • Ruben H. Zamar
چکیده

We propose a nonparametric cluster algorithm based on local medians. Each observation is substituted by its local median and this new observation moves toward the peaks and away from the valleys of the distribution. The process is repeated until each observation converges to a fixpoint. We obtain a partition of the sample based on the convergence points. Our algorithm determines the number of clusters and the partition of the observations given the proportion α of neighbors. A fast version of the algorithm where only a subset of the observations from the sample is processed is also proposed. A proof of the convergence from each point to its closest fixpoint and the existence and uniqueness of a fixpoint in a neighborhood of each mode is given for the univariate case. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 349–362, 2012

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An $\ell_1$-Method for Clustering High-Dimensional Data

In general, the clustering problem is NP–hard, and global optimality cannot be established for non–trivial instances. For high–dimensional data, distance–based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high–dimensional spaces. We propose a distance–based iterative method for clustering data in very high–dimensional space, usin...

متن کامل

A Self-adaptive Spectral Clustering Based on Geodesic Distance and Shared Nearest Neighbors

Spectral clustering is a method of subspace clustering which is suitable for the data of any shape and converges to global optimal solution. By combining concepts of shared nearest neighbors and geodesic distance with spectral clustering, a self-adaptive spectral clustering based on geodesic distance and shared nearest neighbors was proposed. Experiments show that the improved spectral clusteri...

متن کامل

Fast PNN-based Clustering Using K-nearest Neighbor Graph

Search for nearest neighbor is the main source of computation in most clustering algorithms. We propose the use of nearest neighbor graph for reducing the number of candidates. The number of distance calculations per search can be reduced from O(N) to O(k) where N is the number of clusters, and k is the number of neighbors in the graph. We apply the proposed scheme within agglomerative clusteri...

متن کامل

An Adaptive Spectral Clustering Algorithm Based on the Importance of Shared Nearest Neighbors

The construction of a similarity matrix is one significant step for the spectral clustering algorithm; while the Gaussian kernel function is one of the most common measures for constructing the similarity matrix. However, with a fixed scaling parameter, the similarity between two data points is not adaptive and appropriate for multi-scale datasets. In this paper, through quantitating the value ...

متن کامل

Feature Selection for Clustering by Exploring Nearest and Farthest Neighbors

Feature selection has been explored extensively for use in several real-world applications. In this paper, we propose a new method to select a salient subset of features from unlabeled data, and the selected features are then adaptively used to identify natural clusters in the cluster analysis. Unlike previous methods that select salient features for clustering, our method does not require a pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistical Analysis and Data Mining

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2012